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> ■ 

\0 ■ APPENDIX B: MULTIPROCESSOR AND PARALLELIZATION ISSUES 

o ■ 

\Q • Cactus (like most modern numerical relativity codes using 3-D grids), is designed to 

O , run in parallel on multiprocessor computer systems. CACTUS uses a domain-decomposition 
Q \ parallelization scheme, where each processor stores and computes the Einstein equations 

■ on its own "chunk" of the spatial grid. Neighboring chunks overlap slightly;^ CACTUS 
O^' "synchronizes" them as necessary. An AH may span multiple processors' grid chunks, and 
5-1 , since an AH may move during an evolution, in general we don't know in advance which 

processor(s) those are. 

' Because of the domain decomposition, the multiprocessor "global" interpolator used for 

^ ■ the geometry interpolation must in general send each interpolation point to the processor 
which "owns" that part of the grid, do the interpolation there, and send the results back 
to the requesting processor. To ensure that every processor has a flow of control in the 
interpolator code to (potentially) handle interpolation points in its chunk of the grid, the 
interpolation must be a collective operation: code on every processor must call the interpo- 
lator synchronously (each processor's code specifying its own choice of interpolation points). 
Violations of this requirement may result in deadlock in the interprocessor-communication 
code. 



Taking these environmental constraints into account, I have parallelized AHFinderDi- 
RECT in the following way: To allow the use of standard (uniprocessor) sparse matrix sub- 
routines for solving the Newton's-method updating routines, AHFinderDirect assigns 
each AH to a single processor,^ and searches for that AH only on that processor. However, 
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^ This is the 3-D Cartesian grid analog of the angular interpatch ghost zones described in the main printed 

version of this paper. 
^ A processor may be assigned multiple AHs if there are more AHs than processors. 



2 



Part (a): 2 processors 
processor 1 processor 2 
h/it what h/it what 



Part (b): 3 or more processors 
processor 1 processor 2 processor 3 any others 
h/it what h/it what h/it what h/it what 



1/1 e 2/1 e 

1/1 J 2/1 J 

1/2 e 2/2 e 

1/2 J 2/2 J 

1/3 _G 2/3 e 

—J 2/3 J 

3/1 e 2/4 e 

3/1 J 2/4 J 

3/2 e 2/5 _e 

3/2 J — J 

3/3 9 — e 

3/3 J — J 

3/4 e_ — e 



1/1 e 2/1 e 3/1 e — e 

1/1 J 2/1 J 3/1 J — J 

1/2 e 2/2 e 3/2 9 — e 

1/2 J 2/2 J 3/2 J — J 

1/3 G_ 2/3 e 3/3 9 — 9 

— J 2/3 J 3/3 J — J 

— 9 2/4 9 3/4 9_ — 9 

— J 2/4 J — J — J 

— 9 2/5 9_ — 9 — 9 



Abbreviations: 

h/it = horizon number /iteration number 
what = what is this processor doing? 



TABLE I: This table shows two examples of how AHFinderDirect finds multiple horizons in 
parallel in a multiprocessor environment, for the case where we search for 3 horizons, which are 
found (the Newton iteration converges, shown by the 9) after 3, 5, and 4 iterations respectively. 
The table rows show actions at successive iterations of the algorithm; " — " means a dummy com- 
putation (described in the main text). Part (a) shows how the algorithm would work with 2 
processors; part (b) shows how it would work with 3 or more processors (the last column refers to 
any processors other than first 3) 

if there are multiple AHs and multiple processors, AHFinderDirect searches for different 
AHs concurrently on the multiple processors. 

All the processors do their Newton iterations synchronously, each processor working se- 
quentially through its own assigned horizon(s), or doing dummy interpolator calls (to pre- 
serve the synchronization across all processors) if it has no assigned horizon(s). If/when 
a processor finishes with a horizon (either locating it or failing to locate it), the processor 
moves on to its next assigned horizon if there is one, or switches to doing dummy interpola- 
tor calls if it has no more assigned horizons left to process. Table |l] shows two examples of 
this.^ 

This algorithm requires an explicit global synchronization across all processors at each 
Newton iteration: After evaluating B, each processor computes a Boolean flag saying 
whether that processor needs to continue iterating (this may be true for either or both 
of two reasons: the Newton iteration hasn't converged yet on the current horizon, or there 
is another horizon or horizons assigned to this processor which hasn't yet been processed). 
All processors then broadcast their flags, and compute the inclusive-or of all the flags to 
determine whether to continue the algorithm or exit. 



^ In part (a) of tabled notice that after horizon 1 converges, processor 1 does a dummy J computation 
before starting on the next horizon. This is slightly inefficient, but considerably simplifies the algorithm 
by keeping the O and J computations synchronized across all processors. In the uniprocessor case this 
dummy J operation is unnecessary, and the algorithm omits it. 
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P ^ Pstart 
6p ^ Jpstart 
G <— Gstart 

while {\6p\ > tolerance) 
{ 

try to find a common AH in T,[p], using G as the initial guess 
if (found a common AH) 
then { 

G <— the common AH just found 

p <^ p + 5p 
} 

else { 

6p ^ ^6p 
p ^ p — 5p 
} 



FIG. 1: This figure shows the continuation-method binary search algorithm for finding the critical 
parameter p at which a common AH appears in a 1-parameter family of slices. 

APPENDIX C: SEARCHING FOR THE CRITICAL PARAMETER OF A 
1-PARAMETER INITIAL DATA SEQUENCE 

In this appendix I describe my continuation-method binary search algorithm for determin- 
ing the "critical" parameter at which a common AH appears/disappears in a 1-parameter 
family of initial data slices, p i— > S[p]. Without loss of generality I assume that small (large) 
values of p do (do not) have a common AH. 

The main complication here is that AHFinderDirect needs an initial guess for an AH 
shape, and if this initial guess is inaccurate AHFinderDirect may fail to find the (an) AH. 
This means that the obvious binary-search algorithm for finding isn't reliable, because a 
failure to find an AH doesn't rule out the possible existence of that AH. 

Instead, I use a continuation method, p is "walked up", using the common AHs found in 
smaller-p slices as initial guesses for trying to find the common AH in larger-p slices. If the 
algorithm fails to find a common AH, it decreases p and tries again with a smaller "walking 
increment" in p. Figure Q shows this algorithm in detail. 
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In 3 + 1 numerical simulations of dynamic black hole spacetimes, it's useful to be 
able to find the apparent horizon(s) (AH) in each slice of a time evolution. A number 
of AH finders are available, but they often take many minutes to run, so they're too 
slow to be practically usable at each time step. Here I present a new AH finder, 
AHFinderDirect, which is very fast and accurate: at typical resolutions it takes 
only a few seconds to find an AH to ~ 10"^m accuracy on a GHz-class processor. 

I assume that an AH to be searched for is a Strahlkorper ("star-shaped region") 
with respect to some local origin, and so parameterize the AH shape by r = /i(angle) 
for some single- valued function h: S"^ ^ 5?"*". The AH equation then becomes a 
nonlinear elliptic PDE in h on 5^, whose coefficients are algebraic functions of gij, 
Kij, and the Cartesian-coordinate spatial derivatives of Qij. I discretize 5^ using 
6 angular patches (one each in the neighborhood of the itx, ity, and itz axes) to 
avoid coordinate singularities, and finite difference the AH equation in the angular 
coordinates using 4th order finite differencing. I solve the resulting system of non- 
linear algebraic equations (for h at the angular grid points) by Newton's method, 
using a "symbolic differentiation" technique to compute the Jacobian matrix. 

AHFinderDirect is implemented as a thorn in the cactus computational 
toolkit, and is freely available by anonymous CVS checkout. 
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INTRODUCTION 



In 3 + 1 numerical relativity, it's often useful to know the positions and shapes of any 
black holes in each slice. These are both key physics diagnostics, and potentially valuable 
for helping choose the coordinate conditions in a numerical evolution. Moreover, black 
holes inevitably contain singularities, which may need to be excised from the computational 
domain ([6, 44]).^ Since the event horizon can be determined only once the entire future 
development of the slice is known, ^ i.e. only after a numerical evolution is done, in practice 
one usually uses the apparent horizon(s) as a working approximation which can be computed 
slice-by-slice while a numerical evolution is still ongoing. (Recall that an apparent horizon 
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^ For more recent work on this topic, see (for example) [2, 3, 11, 15]. 

^ For numerical purposes the usual approximate development to a nearly-stationary state suffices ([4, 13, 
21, 32]). 
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is always contained inside an event horizon, and they coincide if the spacetime is stationary 
([26]).) Apparent horizons are also interesting due to their close relationship to isolated 
horizons, which have many useful properties ([7, 22] and references therein). 

There has thus been longstanding interest in algorithms and codes to find apparent 
horizons in numerically computed spacetimes (slices). Here I focus on the case where there 
are no continuous symmetries such as axisymmetry, and where the spatial grid is Cartesian. 
Many researchers have developed apparent horizon finding algorithms and codes for this 
case, for example [5, 10, 25, 27, 28, 30, 37, 41, 42, 45, 51]. However, with the exception of 
[41, 42],^ the existing numerical codes for apparent horizon finding are generally very slow, 
often taking several minutes to find each apparent horizon even on modern computers. This 
is a serious problem, since we would ideally like to find apparent horizons at each time step 
of a numerical evolution, and there may be tens of thousands of such time steps. 

In this paper I describe a new numerical apparent horizon algorithm and code (based on 
a generalization of the algorithm and code I described previously for polar-spherical grids 
([48])) which is very fast: for typical resolutions it takes only a few seconds to find an 
AH, so it's practical to run it at every time step of a numerical evolution. This apparent 
horizon finder is also very accurate, typically finding apparent horizons to within a few tens 
of parts per million in coordinate position, with similar accuracies for derived quantities 
such as the apparent horizon area, irreducible mass, coordinate centroid, etc. This apparent 
horizon finder is implemented as a module ("thorn") AHFinderDirect in the CACTUS 
computational toolkit ([24], http://www.cactuscode.org), and is freely available (GNU 
GPL licensed) by anonymous CVS checkout. 

In the main body of this paper I give a relatively high-level description of AHFinderDi- 
rect's algorithms; in the appendices I discuss various technical issues in more detail. 

A. Notation 

I generally follow the sign and notation conventions of [53]. In particular, I use the Penrose 
abstract-index notation, with indices i-m running over the (Cartesian) spatial coordinates 

= (x, y, z) in a (spacclikc) 3 + 1 slice, gij is the 3-metric in the slice, with associated 
covariant derivative operator Vj. Kij is the extrinsic curvature of the slice (I use the sign 
convention of [54], not that of [53]) and K = Ki' is its trace. r]ij is the fiat 3-metric. Indices 
uvw run over generic angular coordinates = (p, a) on the apparent horizon surface. "N- 
D" abbreviates 'W-dimcnsional" . In cases where the distinction is important, I use a prefix 

to denote quantities defined on a 3-D neighborhood of the apparent horizon surface. 

Small-capital indices UK label angular grid points on the apparent horizon surface, and 
h[l] is the evaluation of a grid function h at the grid point I. and are finite difference 
molecules discretely approximating the angular partial derivatives and duv respectively. If 
m is an index into a finite difference molecule M, then M[ni] is an individual molecule coefficient, 
and m G M means that this coefficient is nonzero. Molecule indices may be obtained by 
subtracting grid point indices (m = J — l) , or correspondingly the sum of a grid point index 
and a molecule index gives a grid point index (j = I -|- m). 



^ Schnetter ([41, 42]) has developed an apparent horizon finding algorithm and code quite similar to mine. 
His work and mine were done independently; neither of us learned of the others' work until our own was 
mostly complete. 
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II. APPARENT HORIZONS 

Given a (spacelike) 3 + 1 slice, a "marginally trapped surface" (MTS) is defined as a 
closed spacelike 2-surface in the slice, whose future-pointing outgoing null geodesies have 
zero expansion 0. In terms of the usual 3 + 1 variables this condition becomes ([55]) 

= ViTi' + Kijn'n^ -K = (1) 

where rf is the outward-pointing unit normal to the surface. 

An "apparent horizon" (AH) is then defined as an outermost MTS in a slice (there may 
be multiple MTSs nested inside each other). In this paper I actually describe an algorithm 
and code for locating MTSs, but since the primary application will be the location of AHs, 
for convenience of exposition I refer to the MTSs as AHs. 

As is common in AH finding, I parameterize the AH surface by first choosing a local 
coordinate origin Xq inside the AH, then assuming that the horizon is a "Strahlkorper" 
("ray body", or more commonly "star-shaped region") about this point. A Strahlkorper is 
defined by Minkowski ([43, p. 108]) as 

a region in n-D Euclidean space containing the origin and whose surface, as seen 
from the origin, exhibits only one point in any direction. 

I take = (p, a) to be generic angular coordinates on the AH surface (or cquivalently, 
on the unit 2-sphere S'^). Given these, I then define the AH shape by r = h{p,a), where 

r = [^j(a;* — x^Y]^^^ is a radial coordinate around the local coordinate origin,^ and the "AH 
shape function" h: —>■ Sf?"*" is a single- valued function giving the radius of the AH surface 
as a function of angular position about the local coordinate origin. 



III. COMPUTING THE EXPANSION (CONTINUUM) 

To write the expansion (and thus the AH equation (1)) explicitly in terms of this 
parameterization, i.e. in terms of /I's 1st and 2nd angular derivatives duh and duvh, I first 
define a scalar function which vanishes on the AH surface and increases outwards, ^^''F = 
r — h{p, a). I then define a (non-unit) outward-pointing normal covector to the AH surface 
as the gradient of this scalar function. 



Si ^ ^ V/^)F (2) 

since F is a scalar (3) 

= dir - dih (4) 

^^-X^duh, (5) 
where I define the coefficients Xf = dy'^/dx'^. It's then straightforward to show that 

d,sj = ^ - X" 1^ - XrX] , (6a) 



Note that I define r to be the flat-space distance from Xq to - there's no use of the S-metric here. 
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where 




if i 



and where I also define the coefficients X^^ = d'^y'^ /dx'^dx^ . 
The outward-pointing unit normal to the AH surface is then 



n 



so the expansion © is given by 

e = Viu' + KijTi'n^ - K 

= diu'' + {di In ^/g)n^ + Kij-rin^ 



ABC 
+ 



-K 

g'^Sj 



+ 



-K 



^3/2 



H K 



where 



A = - {9''sk){g^'se)d,s, - l{g'^s,)[{d,g''')skse 

B = {dig'^)sj + g'^diSj + {di In V^W'sj) 
C = K'^SiSj 



D 



g'^SiSj . 



(6b) 



(7) 

(8) 
(9) 

(10) 
(11) 



(12a) 

(12b) 
(12c) 
(12d) 



Setting r — h in the definitions (5) and (6) and substituting into (11) and (12) gives 
explicitly in terms of h and its 1st and 2nd angular derivatives, so the AH equation (1) takes 
the form 

= Q{h, duh, duvh] gij, Kij, dtgij) = (13) 

where the dependence on gij, Kij, and dkgij is implicit through their position dependence 
(this is discussed in detail in section VI A) . 



IV. SOLVING THE APPARENT HORIZON EQUATION 

I view the AH equation (13) as an elliptic PDE for hon S"^, and discretize it using standard 
finite differencing methods: 1 introduce a total of A^ang angular grid points {(pi,(Ti)} on S^, 
and represent h and by their values {hi} and {0i} at these points. Approximating the 
angular derivatives duh and d^vh by finite differencing, (13) then becomes a set of A^ang 
nonlinear algebraic equations {0, = 0} for the Aang {^i} values. 

I solve these equations by Newton's method in Aang dimensions. This in turn has several 
subparts: 

• The actual Newton's-method iteration algorithm 

• Computing the (discrete) expansion {0i} given a (discrete) trial AH shape {hi} 
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• Computing the Jacobian matrix Ju = dQi/dhj given a (discrete) trial AH shape {h^} 

• Solving the Newton's-method updating equations J ■ 5h — — © 
I describe these in detail in the following sections. 

V. NEWTON'S METHOD 

The basic multidimensional Newton's-method algorithm is well known (see, for example, 
[47, section 5.3]), but several refinements are necessary for a practical AH finder: 

To make Newton's method converge more robustly if the initial guess is poor, and to limit 
divergence if the iteration doesn't converge, AHFinderDirect limits any single Newton 
step to have an oo-norm over the angular grid which is no more than a specified maximum 
fraction (10% by default) of the mean horizon radius. 

Much more sophisticated "modified Newton" algorithms could be used to achieve faster 
or more robust convergence (eg. [8, 9, 29, 36, 38-40]), but in practice this hasn't been 
necessary.^ In particular, the high-spatial-frequency convergence problems 1 have previously 
described for Newton's-method apparent- horizon-finding ([48]), don't seem to occur often 
in practice. 

If the slice doesn't contain an AH (or if either the 3-D Cartesian grid or the S"^ angular 
grid has insufficient resolution), then the Newton iteration will probably fail to converge. In 
practice, AHFinderDirect detects this by limiting the Newton iteration to a maximum 
number of iterations. It's useful to distinguish between two subcases here: 

• If we're searching for an AH or AHs at each time step of a numerical evolution, and 
we found this AH at the previous time step, then that AH shape probably provides an 
excellent initial guess for this step's Newton iteration, so a relatively low maximum- 
iterations limit is appropriate. AHFinderDirect uses a default of 10 iterations for 
this case. 

• Otherwise (if we do not have a previous-time-step AH as an initial guess), in practice 
the initial guess is likely to be rather inaccurate, so a higher maximum-iterations limit 
is appropriate. AHFinderDirect uses a default of 20 iterations for this case. 

In addition to the maximum-iterations limit, AHFinderDirect also aborts the finding 
of an AH if any trial horizon shape {h^} is outside the 3-D Cartesian grid. Otherwise, 
AHFinderDirect considers an AH to have been found if and only if the oo-norm of the 
{Gi} values over the angular grid is below a specified threshold (10~^ by default). 

For better efficiency, in a multiprocessor environment AHFinderDirect finds multi- 
ple AHs in parallel across multiple processors. I describe the algorithm for doing this in 
appendix B. 



^ Additionally, due to the way AHFinderDirect finds multiple AHs in parallel across multiple processors 
(discussed in detail in appendix B), it would be difficult to use many of the uniprocessor modified-Newton 
software packages such as [36, 38, 39] . 
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VI. COMPUTING THE EXPANSION 9 (DISCRETE) 

Given a trial AH shape {hi}, I compute the expansion {0i} using (13), approximating the 
angular derivatives duh and duvh by the usual centered 4th order finite difference molecules 
D„ and Duv respectively. However, there are several complications in this process, which I 
discuss in the following subsections. 

A. Geometry Interpolation 

As shown in section III, the expansion implicitly depends on the geometry variables 
Qij, Kij, and dhQij at the AH surface. In practice the geometry variables are only known on 
a (3-D) Cartesian grid, so they must be interpolated to the AH surface. 

Instead of computing the 3-metric derivatives dtgij on the full 3-D grid and then inter- 
polating these values to the AH surface, it's much more efficient to do the differentiation 
only at the AH-surface points, inside the interpolator: An interpolator generally works by 
(conceptually) locally fitting a fitting function (usually a low-degree polynomial) to the data 
points in a neighborhood of the interpolation point, then evaluating the fitting function at 
the interpolation point. By evaluating the derivative of the fitting function, the dkgij values 
can be interpolated very cheaply, using only the 3-D input values which are used anyway 
for interpolating the gij. 

Even for C°° gij and Kij, the usual Lagrange polynomial interpolators give results which 
are continuous, but not differentiahle (the interpolated dkgij generically has a jump discon- 
tinuity) at each position where the interpolator switches the set of input 3-D grid points it 
uses. (The non-smoothness of interpolation errors is discussed in more detail in [49, Ap- 
pendix F].) Unfortunately, this lack of smoothness propagates into the AH equation (13), 
sometimes causing Newton's method to fail to converge. To avoid this problem, I use a 
(cubic) Hermite interpolator, which guarantees that the interpolated gij and Kij remain dif- 
ferentiahle, and that the interpolated d^gij remains continuous, even when the interpolator 
switches input-grid-point sets. Figure 1 shows an example of the smoothness properties of 
Lagrange and Hermite interpolation, for a simple 1-D model problem. 

While the resulting (C°) smoothness of Q{h) isn't quite ideal for Newton's method, in 
practice it seems to be sufficient not to impair convergence, and attaining a higher degree 
of smoothness would require a significantly more complicated and expensive interpolator. 

B. Multiple Grid Patches 

To avoid z axis coordinate singularities in the angular computations, I use multiple grid 
patches to cover S"^. Figure 2 shows an example of this. In general there are 6 patches, 
covering neighborhoods of the ±a;, ±y, and ±2; axes respectively. 

Each patch's nominal grid (shown in thick lines in figure 2) is surrounded by a "ghost 
zone" (shown in thin lines in figure 2). Once the h values in the ghost zones are filled in 
by symmetry operations and/or interpatch interpolation, the finite differencing code can 
ignore the patch boundaries in computing 0. To keep the interpatch interpolation errors 
(more precisely, their numerical 2nd derivatives) from dominating those of the 4th order 
patch-interior angular finite differencing, I use 5th order Lagrange polynomial interpolation. 
The patch coordinates (p, a) are defined such that adjacent patches always share a common 
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FIG. 1: This figure shows the errors for cubic Lagrange and Hermite interpolation of the function 
f(x) = exp[sin(27rx)] with grid spacing Ax = 0.1. Notice that the Lagrange error (and hence the 
Langrange interpolant itself) is non-differentiable at the grid points, whereas the Hermite error 
(and interpolant) is differentiable everywhere. 



angular coordinate, so only 1-D interpolation is required here. I describe the multiple-patch 
scheme in detail in appendix A. 

The Jacobian matrix Ju must also take into account the ghost-zone symmetry operations 
and interpatch interpolations. This is conceptually simple, but does require explicitly know- 
ing the Jacobian (i.e. the interpolation coefficients) of the interpatch interpolation. The 
details are somewhat complicated, and are described in appendix A 3. 

The multiple-patch scheme works well, but requires a lot of subtle coding, particularly 
in handling the ghost-zone updates near patch corners. The overall patch infrastructure is 
currently about 12K (7K non-blank non-comment) fines of C-l-l- code, out of a total of about 
25K (15K) fines of C-l-h and 2.5K (1.5K) of Maple in AHFinderDirect. In hindsight, a 
much simpler scheme might well have sufficed to avoid z axis problems. Notably, [41, 42] 
reports excellent results using a simple latitude- longitude grid on S^, witfi tfie grid points 
staggered across the north/south poles. Another possibility ([23]) would be to have 2 patches 
meeting at the equator, each using stereographic coordinates. 



VII. COMPUTING THE JACOBIAN MATRIX 

If there are A^ang angular grid points, then the Jacobian matrix Ju = dQi/ dh, is an 
Aang X A^ang matrix; J is sparse due to the locality of the angular finite differencing. The 
obvious way to compute J is by numerical perturbation: perturb h at a single angular 
grid point J, then re-evaluate and determine the Jth column of J from the changes in ©. 



^ An important optimization is to only re-evaluate © within an angular-molecule-sized neighborhood of the 
perturbed point J. 
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FIG. 2: This figure shows a multiple-grid-patch system covering the (-I-, -|-, -|-) octant of with 
3 patches, at an angular resolution of 5°. The +z, +x, and +y patches are shown in red, green, 
and blue respectively. Each patch's nominal grid is shown in thick lines; the ghost zones are shown 
in thin lines. 

However, for typical A^ang values of 300-3000, this is very slow (though its relative simplicity 
makes it useful for debugging purposes). 

Instead of numerical perturbation, AHFinderDirect normally uses the "symbolic dif- 
ferentiation" algorithm of [48, section VI] to compute J directly from the angular du and duv 
finite difference molecule coefficients and the (continuum) Jacobian coefficients dQ/d{duh) 
and dQ/d{duvh). Temporarily neglecting the interpatch interpolation, the Jacobian is thus 
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given by 



Ju = ^ = < d{duh) 



otherwise 

r n .p I r n ^ T 

D„^,[j-l] ifj-lGD„^l \ o,B itl = j 



r, 4.U ■ U otherwise 

U otherwise 

where the first two terms describe the variation in © at a fixed spatial position with respect 
to h, and the last term describes the variation in due to a change in h changing the 
evaluation position of - and thus the position-dependent coefficients in - 0. Notice that 
there is no term here for dQ / dh, since this dependence is included in the 9^0 term. 

As mentioned in section VI B, the Jacobian (14) must be modified to take into acount the 
ghost-zone symmetry operations and interpatch interpolations. This is described in detail 
in appendix A 3. 

Because depends on gij, Kij, and dkQij (cf. (13)), in theory the 9^0 term in (14) also 
requires interpolating dkKij and dkeQij (cf. section VIA). However, doing the computation 
this way would require a much larger number of interpolations (a total of 80 geometry- 
interpolator outputs instead of 30), and the expressions for computing 9^0 from the inter- 
polated values would be quite complicated.^ 

To avoid these problems, 1 approximate 9^0 by a one-sided radial finite difference, 
drQ ^ [Q{h + e) - 0(/i)] /s, with e typically chosen to be 10"^ ([16], [47, pp. 266-267]). 
Even though this approximation is only 0{e) accurate, in practice this doesn't impair the 
convergence of Newton's method, and it's fairly cheap to compute (one extra 0(/i) evaluation 
per Jacobian computation). 



VIII. SOLVING THE LINEAR SYSTEM J • dh = 

The Jacobian matrix is an A^ang x A^ang sparse matrix; for typical angular resolutions Aang 
is in the range 300-3000. Thus for good efficiency it's important to exploit J's sparsity in 
both storage and computation. 1 have tried several different linear-equation codes and 
storage formats: For debugging purposes I have found it very useful to store J as a dense 
matrix and solve the linear system with LAPACK routines.^'^ For better efficiency I now 
use either an incomplete-LU-decomposition preconditioned conjugate gradient code ILUCG 



The arguments of section VI A would suggest also having the geometry interpolator guarantee at least 
continuity of the 2nd derivative values here, although it's not clear if this would actually be necessary in 
practice. 

^ LAPACK is available from Netlib (http://www.netlib.org). 

^ LAPACK's condition number estimator is a particularly valuable debugging and diagnostic tool. For 
example, incorrect symmetry boundary conditions often result in J being singular (infinite condition 
number). Another example was in investigating why the Newton iteration sometimes failed to converge 
in an early version of AHFinderDirect which used Lagrange rather than Hermite interpolation for the 
geometry variables (cf. section VIA): it was useful to be able to rule out ill- conditioning of the linear 
system as a possible cause of the convergence failure. 
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([31]), or the UMFPACK sparse LU-decomposition code ([17-20]);^°'^^ both of these codes 
use the standard "compressed row storage" sparse storage scheme for J. Neither code has 
been entirely satisfactory, so I plan to explore other sparse LU-decomposition codes in the 
near future. 

IX. PERFORMANCE AND ACCURACY 

In this section I outline the general factors affecting AHFinderDirect's performance 
(how quickly it can find an AH, or try to find one) and accuracy (how accurately is an AH 
found). I also briefly compare AHFinderDirect to other AH flnders in these respects. I 
defer detailed numerical results to section X. 

A. Performance 

AHFinderDirect's performance (the time taken to find, or try to find, an AH) de- 
pends on two main factors: the total number of angular grid points in the multiple-patch 
system, and the number of Newton iterations. Since there are no computations done at each 
Cartesian-grid grid point, the performance is almost independent of the size and resolution 
of the Cartesian grid.^^ 

The total number of angular grid points, iVang, is determined by the angular resolution 
chosen, and whether there are any discrete symmetries in the multiple-patch system. Since 
practical values of Aang vary over roughly an order of magnitude, and empirically the per- 
formance scales very roughly as -/Vang; the performance varies over a wide range from this 
factor alone. 

The number of Newton iterations performed by AHFinderDirect is mainly determined 
by the type of AH being searched for: 

• AHFinderDirect is fastest when searching for - and successfully finding - an AH 
at each time step of a numerical evolution. In this case the AH typically only moves 
a small distance from one time step to the next, so (using the previous time step's 
position as an initial guess for the Newton iteration, cf. section V) typically only 
3 Newton iterations are needed to locate it at each time step. 

• If AHFinderDirect finds an AH in an initial data slice, typically the initial guess 
is much less accurate, so 6-10 Newton iterations are needed. 

• AHFinderDirect is at its slowest when searching for - but failing to find - an AH at 
each time step of a numerical evolution. In this case (again cf. section V) it typically 
takes 20 Newton iterations at each time step. 



UMFPACK is available from http://www.cise.ufl.edu/research/sparse/umfpack. 
UMFPACK also has a condition number estimator, but as of version 4.0 it appears to be unreliable. 
On an idealized computer there would be no Cartesian-grid resolution dependence at all, but on actual 
computers cache effects in the geometry interpolator may cause a slight slowdown at higher Cartesian-grid 
resolutions. 
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As discussed in appendix B, AHFinderDirect can search for multiple AHs in parallel 
on a multiprocessor computer system. In practice, for large-scale runs there are usually 
(many) more processors available than the number of AHs being searched for. Assuming 
this, the elapsed time taken to search for all the AHs in parallel is basically the maximum 
of the time taken to search for each individual AH; this is roughly independent of both the 
number of AHs searched for, and the number of processors available. Part (b) of the table 
in appendix B should make this clearer. 

B. Accuracy 

The accuracy with which AHFinderDirect can find an AH is mainly determined by 
the finite differencing errors in the evaluation of the expansion 0. There arc two main 
error contributions: the geometry interpolation from the Cartesian grid to the AH position, 
and the angular finite differencing within the multiple-patch system on 5*^. (Other error 
sources such as the interpatch interpolation, the nonzero ||0|| at which the code considers 
the Newton iteration to have "converged" , and floating-point roundoff errors, are generally 
negligible in comparison to the main finite differencing errors.) 

For given (smooth) Qij and Kij, the errors from the geometry interpolator arc determined 
by the 3-D (Cartesian) grid spacing Axyz, and by the order of the interpolation scheme. 
In the limit of small Axyz, a cubic Hermite geometry interpolator gives gij and K^j to 
0(^{Axyz)'^^ and dkgij to 0(^{Axyz)^), contributing 0[{Axyz)^) errors to 0. However, 
at practical resolutions of Axyz ~ 0.03m-0.1m I find that the convergence is often 0.5- 
1.0 power of Axyz better than this, only dropping to the theoretical limits for very high- 
resolution grids (in practice, Axyz ^ 0.01m). 

AHFinderDirect uses 4th order angular finite differencing within the multiple-patch 
system on S^, which contributes 0((Apcr)^) errors to 0, where Apa is the angular resolution. 

C. Comparison to Other AH-Finding Methods 

Curvature-flow or fast-flow methods are widely used for AH-flnding (see, for example, 
[25, 30, 45, 51]). Conceptually, a flow method starts with a large 2-surface, and flows this 
inwards, in such a manner than the flow velocity vanishes on the AH. Unfortunately, this 
means that the method must move the 2-surface through a large part of the 3-D grid - and 
thus must do nontrivial computations at a large number of 3-D grid points - before the 
surface can closely approximate the AH. In contrast, an elliptic-equation method such as 
that used by AHFinderDirect need only do computations on a 2-D set of (AH-surface) 
grid points, so it can potentially be must faster. 

However, a flow method can (at least modulo numerical errors) guarantee to find the 
outermost MTS in a slice, whereas an elliptic-equation method is only locally convergent, 
and hence offers no information on what other MTSs might be outside any "AH" it finds. 

Another common class of AH-finding methods are function-minimization methods such 
as those described by [5, 10]. These parameterize a trial AH surface by spherical harmonic 
or other spectral coefficients, define a surface-integral error norm J 0^ dA which has a global 
minimum of at the AH surface, then use a general-purpose function-minimization algorithm 
to minimize the error norm over the surface-coefficient space. These methods are inherently 
quite slow because (for a generic slice with with no continuous symmetries) they must 
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determine a fairly large number of surface coefficients, and the generic function-minimization 
algorithm only "learns" a single number (the error norm) for each surface evaluation, and 
thus requires many surface evaluations to converge. For example, using a spherical harmonic 
expansion up to order N to parameterize the AH surface, there are O(iV^) surface coefficients, 
so 0{N'^) iterations are needed to converge. Each iteration takes 0{N'^) work to evaluate 
the surface integral, so the total work is 0{N^). The exponential convergence of spectral 
series allows N to be chosen to be fairly small for a given surface accuracy, but in practice 
function-minimization AH finders are still very slow. 

Minimization methods are also inherently somewhat limited in their accuracy, because 
the location of the error norm's minimum is very sensitive to small numerical errors. (In 
general, relative errors of 0{e) in a smooth function result in relative errors of 0{-\/e) in the 
location of the function's minima.) 

D. What makes AHFinderDirect Fast? 

Based on the above analyses, I think the key algorithm component which makes AHFind- 
erDirect fast is the posing of the AH equation (1) as an elliptic PDF on S'^ for the AH 
shape function h. Given this, 1 believe that any efficient implementation would result in an 
AH finder with roughly the same performance and accuracy as AHFinderDirect. 

A notable example of this is Schnetter's AH finder ([41, 42]), which poses the AH equation 
in the manner as mine, but uses a rather different finite differencing scheme and solution 
method for the finite difference equations. We have not yet made a detailed comparison of 
our AH finders, but it appears they are broadly comparable in performance and accuracy. 

Huq's AH finder ([27, 28]) also poses the AH equation as an elliptic PDF on 5^, but he 
uses Cartesian-grid finite differencing to evaluate the surface expansion and Jacobian ma- 
trix, rather than the angular-grid finite differencing which Schnetter and I use. Because of 
this, and because he uses numerical perturbations to compute the Jacobian matrix (cf. sec- 
tion VII), Huq's AH finder is roughly an order of magnitude slower than mine. 

X. SAMPLE RESULTS 

In this section I present various sample results to test and demonstrate AHFinderDi- 
RECt's performance. For comparison, I also show some results for another AH finder imple- 
mented in the Cactus toolkit, the fast-flow method of [25].^^ (This was the main Cactus 
AH finder prior to AHFinderDirect.) Although some of the test slices are in fact axisym- 
metric, I configured both AH finders to treat the shces as fully 3-D, with only the discrete 
symmetries of reflection across the x, y, and/or z = planes as appropriate. All timings are 
user-mode CPU times on a 1.7 GHz dual Pentium IV processor system (256 KB cache per 
processor) with 1.0 GB of memory. 



Cactus thorn AHFinder, slightly modified to allow a spherical harmonic expansion up to degree tmax = 50 
(by default the limit is 19). (As discussed below, in practice AHFinder is limited to imax ^ 20.) 
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A. Boosted Kerr Slices 

As a first test case, I first consider Kerr spacetime in Kerr-Schild coordinates ([35, exer- 
cise 33.8]), where the AH is a coordinate eUipsoid with radia (semi- major axes) 

= (1 + Vl-a2)m (15a) 




(15b) 



and area 

A = 47r(r^ + a^m^) (15c) 

where a — J/m^ is the black hole's dimensionless angular momentum. I then Lorentz-boost 

this with a velocity v in the x direction. The horizon area is invariant under the boost, but 
in the code's coordinate system, length-contraction makes the AH a triaxial ellipsoid, and 
the interaction of the black hole's spin and the boost results in the slice not being symmetric 
across either the x = or the y — Q plane. 

Table I shows the accuracy and performance of AHFinderDirect and the fast flow 
AH finder on various boosted-Kerr slices, for a number of choices of the various numerical 
parameters. 

The first section of the table shows AHFinderDirect's behavior as the resolution of the 
underlying Cartesian grid is varied, using the default cubic Hermite geometry interpolator. 
At very low resolution {Axyz = 0.2) AHFinderDirect fails to find the AH, due to the 
geometry interpolation "seeing" the Kerr ring singularity. At higher resolution (decreasing 
Axyz) the accuracy improves rapidly, until it levels out at high resolutions due to the angular 
finite differencing errors. For the computer system used here, the time taken to find the AH 
is essentially independent of the Cartesian grid resolution. 

The second section of the table shows AHFinderDirect's behavior as the resolution of 
the underlying Cartesian grid is varied, using a lower-order (quadratic) geometry interpola- 
tor. Compared to the default (cubic) geometry interpolator, this makes AHFinderDirect 
a factor of 2 to 3 faster, and roughly an order of magnitude less accurate. Also, at the very 
lowest resolution AHFinderDirect is now able to find the AH, when it couldn't find it 
using the cubic interpolator. 

Comparing the first two sections of the table shows that changing the interpolation or- 
der seems to make only a minor difference to the fast fiow method's behavior; all the re- 
maining tests use its default (quadratic Lagrange) geometry interpolator. As discussed in 
section IX C, the fast flow method becomes much slower at high Cartesian grid resolutions. 

The third section of the table shows AHFinderDirect's behavior as the angular resolu- 
tion is varied. As the resolution is increased (decreasing Apcr) AHFinderDirect becomes 
slower but more accurate, until the error levels off at high angular resolutions due to the 
Cartesian-grid geometry interpolation errors. 

The third section of the table also shows the fast flow method becoming slower as its 
resolution parameter i-^ax is increased. Unfortunately, beyond £max ~ 10 the method's accu- 
racy stops improving and begins to worsen, and beyond ij^ax ~ 20 the fast fiow method fails 
to find the AH. I suspect this is due to numerical ill-conditioning, but I haven't investigated 
this in detail. 

The fourth section of the table shows AHFinderDirect's behavior when the local 
coordinate origin is offset from the coordinate origin. Notice that the accuracy with which 
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Fast Flow 



Table columns not described in the main text: 

origin The (x, y) components of the local coordinate origin (the z component is always 0) 
interp The geometry interpolator: 

H(L) means Hermite (Lagrange) polynomial interpolation, 

the following integer gives the order 

time user-mode CPU time in seconds 

||<5/i||rins The rms-norm over the angular grid of the error in the computed AH radius h 

{5 A) /A The relative error in the computed AH area 

^max The maximum order of the spherical harmonic expansion 



Tast-flow initial guess changed to a coordinate sphere of radius 2.5, and Cartesian grid enlarged to size ±4 
(the larger Cartesian grid size points should have only minimal effects on AHFinderDirect's performance, 
but should slow the fast flow method by a factor of (4/2.5)'^ ~ 4) 
''AHFinderDirect initial guess changed to a coordinate ellipsoid of radia (0.5, 1.5, 1.0) 
'^AHFinderDirect initial guess changed to a coordinate ellipsoid of radia (0.3, 1.5, 1.0) 
"^Cartesian grid shrunk to size ±2 to reduce memory usage (this should have only minimal effects on 
AHFinderDirect's performance) 

TABLE L This table shows the accuracy and performance of AHFinderDirect on various boosted 
Kerr slices. In each case the black hole has dimensionless rest mass m = 1. Except as noted, the 
Cartesian grid is of size ±2.5 (more precisely, [—2.5, +2.5] in x and y and [0, 2.5] in z, with z —z 
reflection symmetry across the z = plane). Except as noted, the AHFinderDirect initial 
guess is a coordinate sphere of radius 1.5, and the fast-flow initial guess is a coordinate sphere of 
radius 2. AHFinderDirect used the ILUCG sparse matrix routines in all cases. In most cases 
the oo-norm error in the AHFinderDirect AH shape was less than twice the rms-norm error 
shown here; in no case did it exceed 5 times the rms-norm error. 
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the AH is found isn't significantly changed, and the time taken to find the AH is only mildly 
increased, even when the local coordinate origin is offset by up to | the AH radius. The 
fast flow method is still able to find the AH with the offset local coordinate origins, but it 
requires changes to the initial guess, and (even after correcting for the larger grid) it slows 
dramatically and becomes less accurate. 

The final section of the table shows AHFinderDirect's behavior on some more difficult 
boosted-Kerr slices, where the spin is closer to maximal and/or the boost is larger. Because 
the ring singularity in Kerr moves closer to the AH at high spins, and length contraction 
makes the AH strongly triaxial at high boosts, these tests used higher Cartesian and angular 
resolutions than the previous tests. AHFinderDirect still finds the horizon rapidly and 
with high accuracy in these cases, although in the two most difficult cases quite good initial 
guesses were required. The fast flow method is not able to find the AH for any of these 
cases, even with some adjustment of its initial guesses (this may be due in part to its user 
interface only allowing for axisymmetric initial guesses). 

Across all the boosted-Kerr tests, AHFinderDirect is roughly an order of magnitude 
faster, and two orders of magnitude more accurate, than the fast flow method. 



B. Misner and Brill-Lindquist Slices 

The Misner ([33, 34]) and Brill-Lindquist ([12]) initial data shces are standard test 
problems in numerical relativity. Both are time-symmetric (Kij = 0), 3-conformally-flat 
{gij = "^Tjij for some spatially- varying conformal factor ^), and (for suitable values of their 
parameters) may contain any number > 1 of black holes. 

The simplest case of Misner data (and the only case I consider here) is that of 2 throats, 
each of bare mass unity. Here the conformal factor is 




where 

r± = + y2 + ± coth(n/i)]' . (17) 

with /X > is a real parameter. The individual throats are located at coordinate positions 
(0, 0, ± coth /x). For small n there is only a single AH enclosing both throats, while for large n 
there are individual AHs enclosing each throat, but no common AH enclosing both throats. 
The conformal factor for A^-throat Brill-Lindquist initial data is 

*(^) = l+2E^e^ (18) 

where the ith throat has bare mass rrii and is located at the coordinate position Xi. Here I 
consider the cases N — 2 and N — 3, where the throats each have bare mass unity, and are 
uniformly spaced in a coordinate circle of radius R > 0. Similarly to the Misner data, for 
small R there is a single common AH, while for large R there are N individual AHs but no 
common AH. 

The AH-finder test problem I consider here is to numerically determine the "critical" 
value of the parameter (// for Misner, R for Brill-Lindquist) at which the common horizon 
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Test Problem 



Alcubierre et al. AHFinderDirect 
parameter critical parameter critical parameter critical AH area 



Misner 

Brill-Lindquist 2-throat 
Brill-Lindquist 3-throat 



H 1.364 1.365 071172 ±3 409.549 358 ±3 

R 0.766 0.766 197 45 ±5 196.407951 ±3 

R 1.18 ±4 1.195 499 53 ±5 444.756 224 ±3 



TABLE II: This table shows the maximum Misner /x and Brill-Lindquist R for which AHFind- 
erDirect found a common AH. along with the area of that common AH. All uncertainties are in 
units of the last digits shown. For the 2-throat Brill-Lindquist data, other values in the literature 
include R = 0.767ibl ([52]) and R = 0.768 ([45]). However, [45] report a critical AH area for 
this case of 184.16, about 6% different from AHFinderDirect's. I don't know the cause of this 
discrepancy. 

appears/disappears for each family of slices. To do this, I used the Cactus thorn IDAna- 
LYTICBH to construct the initial data slices, approximating the infinite sum (16) by its first 
30 terms. For each of a number of combinations of the CACTUS Cartesian grid spacing and 
the AHFinderDirect angular grid spacing,^^ I used a continuation-method binary search 
(described in detail in appendix C) to determine the critical parameter. I did convergence 
tests ([14]) in both grid spacings to verify that the values shown are reliable estimates of 
the true continuum values, and I used Richardson extrapolation in the angular grid spacing 
to improve the accuracy. Table II shows the results, together with values reported by [1] 
for comparison. The AHFinderDirect values are in excellent agreement with those of [1] , 
and are dramatically more accurate. 



As a final example, I consider the binary black hole collision evolution described in ta- 
ble III. Figure 3 shows the AH areas found by AHFinderDirect and the fast flow AH 
finder for this evolution. For the AHs they both find, the two AH finders agree very well. 
AHFinderDirect found the outer common AH somewhat sooner than the fast flow AH 
finder {t = 4.633 (3.64m) versus t = 5.50 (4.32m)), and was the only finder to find the inner 
common AH. Figure 4 shows the 3 AHs found by AHFinderDirect at two times during 
the evolution. 

For this evolution the mean CPU times per time step were 5.2 seconds for AHFind- 
erDirect and (for those time steps for which it ran) 55 seconds for the fast flow AH finder. 



Raising this to 50 terms changed the numerically computed critical by < 10 and the horizon area 



I used very high resolutions here, with grid spacings as small as 0.01 for the Cactus 3-D grid and 0.5° 

for the AHFinderDirect angular grid. 

Because the AHFinderDirect angular grid isn't commensurate with the Cactus Cartesian grid, the 
geometry-interpolation errors effectively have a quasirandom phase at each angular grid point. This 
prevents these errors from being smooth enough to allow Richardson extrapolation on the Cartesian grid 
spacing. However, the variation of the computed critical parameters with the Cartesian grid spacing can 
still be used qualitatively to help estimate the critical parameters' accuracy. 



C. Binary Black Hole Collision Spacetimes 



by < 10-^. 
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Initial data 

Coordinates 
Numerical grid 
Time integration 

Outer boundaries 



AHFinderDirect 



Fast flow AH finder 



Misner fi = 2.0 (ADM mass m = 1.272) 
black holes located on the z axis at z = ±0.99 
1 + log lapse, F- driver shift 

Axyz = 0.0666 (0.052m); +xyz octant symmetry 
iterated Crank-Nicholson (3 iterations) 
Courant number At/ Axyz = 0.25 
in the computational coordinates the outer boundaries 
were at xymax = 5.37 (4.22m), Zmax = 6.43 (5.05m); 
a "fisheye" nonuniform-grid transformation was used 
which placed the physical outer boundaries at 
xymaiK = 13.1 (10.3m), Zmax = 16.3 (12.8m) 

searching for individual and inner /outer common AHs at each time step 
cubic Hermite geometry interpolation 
angular resolution Apa = 5° for individual AH 

(-^ang = 580, +xy quadrant symmetry, local coordinate origin (0,0,0.85)) 
angular resolution Apa = 3° for inner and outer common AHs 

(^ang = 768, +xyz octant symmetry, local coordinate origin (0, 0, 0)) 
UMFPACK sparse matrix routines 

searching for individual and common AHs each 10 time steps 
fast flow method ([25]), spherical harmonics up to Imax = 10 
same local coordinate origins as AHFinderDirect 



TABLE HI: This table gives various parameters for the binary black hole collision evolution shown 
in figures 3 and 4. 

so despite searching for 3 AHs instead of 2, AHFinderDirect was about an order of 
magnitude faster than the fast flow method. 

Since the runs just described, I have changed AHFinderDirect's default geometry 
interpolator from cubic to quadratic Hermite. In practice AHFinderDirect is usually 
used in numerically computed slices whose geometries have numerical errors large enough 
to dominate AHFinderDirect's intrinsic errors. Thus (cf. section XA) the lower-order 
geometry interpolation makes little difference to the practical accuracy with which the ap- 
parent horizons are found, and it speeds up AHFinderDirect by roughly a factor of 2 to 3. 
For example, in a recent large binary-black-holc collision simulation (details of which will be 
reported elsewhere), AHFinderDirect (using the UMFPACK sparse matrix routines) 
averaged 1.7 seconds per time step, as compared with 61 seconds per time step for the fast 
flow AH finder. 



XI. CONCLUSIONS 

In this paper I present a detailed description of a new numerical apparent horizon finder 
for 3-dimensional Cartesian grids, AHFinderDirect. AHFinderDirect is typically 
at least an order of magnitude faster than other widely-used apparent horizon finders; in 
particular AHFinderDirect is fast enough that it's practical to find apparent horizons at 
each time step of a numerical evolution. This allows apparent horizon positions to readily 
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FIG. 3: This figure shows the areas of the various AHs in the Misner /x = 2.0 collision described 
in table III. The black points are the areas found by the fast flow AH finder; the other curves are 
all from AHFinderDirect. The gradual rise in the area of the outer common AH after t ~ 9, 
and in the area of the individual AH after t ~ 15, is due to outer boundary reflections making the 
overall evolution inaccurate. 

be used in coordinate conditions (see, for example, [46]) or for other diagnostic purposes. 

AHFinderDirect is also very accurate, typically finding apparent horizons to within 
~ 10~^m in coordinate position. 

AHFinderDirect is implemented within the Cactus computational toolkit, 
and is freely available (GNU GPL licensed) by anonymous CVS checkout from 
CVS . aei .mpg. de : /numrelcvs in the directory AEIThorns/AHFinderDirect. It would also 
be fairly easy to port AHFinderDirect to a different (non-CACTUS) numerical relativity 
code. 

APPENDIX A: DETAILS OF THE MULTIPLE PATCH SYSTEM 

1. Coordinates 

I define angular coordinates on S'^ based on rotation angles about the local xyz coordinate 
axes: 

yU = rotation angle about the local x axis = arctan(?//z) 

u = rotation angle about the local y axis = arctan(x/2;) (Al) 

= rotation angle about the local z axis = arctan(?//x) 




FIG. 4: This figure shows the three AHs in the Misner = 2.0 collision described in table III. 
Part (a) shows the horizons at t = 5.00 (3.93m); part (b) shows them at t = 8.00 (6.28m). In both 
parts the color coding matches that of figure 3. 

where all the arctangents are 4-quadrant based on the signs of x, y, and z. I then define 
coordinate patches covering neighborhoods of the ±z, ±x, and ±y axes, using the generic 
patch coordinates 

±z patch has generic patch coordinates (p, a) = {ji, u) 

±x patch has generic patch coordinates (p, a) = {u, 0) (A2) 

±y patch has generic patch coordinates (p, a) = (p, (p) 

The resulting set of 6 patches cover S"^ without coordinate singularities.^'^ Alternatively, 
if the slice has z ^ —z reflection symmetry about the local coordinate origin, then the 
5 patches +2;, ±x, and ±?/ cover the +2; hemisphere of 5*^. Similarly, suitable sets of 4 or 
3 patches may be used to cover quadrants or octants of S*^ respectively; figure 2 shows an 
example of this last case. 



^'^ Another way to visualize these patches and coordinates is to imagine an xyz cube with xyz grid lines 
painted on its face. Now imagine the cube to be flexible, and inflate it like a balloon, so it becomes 
spherical in shape. The resulting coordinate lines will closely resemble those for (/i, v, (f>) coordinates. 



20 



2. Ghost Zones 

Each patch is a rectangle in its own (p, a) coordinates; I use the usual "ghost zone" 
technique for handling finite differencing near the patch boundaries. I refer to the non- 
ghost-zonc part of a patch's grid as its "nominal" grid. Adjacent patches' nominal grids just 
touch. (Grid-function values in) the ghost zones are filled in from values in their own and 
other patches' nominal grids by symmetry operations and/or interpatch interpolations. 

With the coordinate choice (A2), adjacent patches always share the angular coordinate 
perpendicular to their mutual boundary, so the interpatch interpolations need only be done in 
one dimension, in the direction parallel to the boundary. Since off-centering an interpolant, 
particularly a high-order one, significantly degrades its accuracy, I have tried to design the 
algorithms to keep the interpolations centered wherever possible. 

The most comphcated part of the multiple-patch scheme is in the handling of the "corner" 
ghost-zone grid points, those ghost-zone grid points which are outside their patch's nominal 
grid in both the p and the a directions. Figure 5 shows the three basic cases: 

(a) Figure 5(a) shows an example of a corner between two symmetry ghost zones. In this 

case it takes 2 sequential symmetry operations (shown by the curved arrows) to fill 
in the corner from the nominal grid. Fortunately, symmetry operations commute, i.e. 
the results are independent of the order of the two symmetry operations. 

(b) Figure 5(b) shows an example of a corner between a symmetry and an interpatch ghost 

zone. To keep the interpolations centered, 1 use a 3-phase algorithm here: 

1. Use symmetry operations (for example, the one shown by the dotted arrow in 
the figure) to fill in the non-corner ghost-zone points in the neighboring patch. 

2. Do a centered interpatch interpolation from the neighboring patch to this patch; 
this interpolation may use some points from the neighboring patch's ghost zone. 

3. Use symmetry operations (for example, the one shown by the solid arrow in the 
figure) to fill in the corner ghost-zone points in this patch. 

(c) Figure 5(c) shows an example of a corner between two interpatch ghost zones (this only 

happens when 3 patches meet at a corner) . This case requires only a single interpatch 
interpolation for each ghost- zone grid point. 

AHFinderDirect actually uses the following 3-phase algorithm (which includes each 
of (a)-(c) above as special cases) to perform all the necessary symmetry operations and 
interpatch interpolations across all patches, in a correct order:^^ 

1. Use symmetry operations to fill in the non-corner parts of all symmetry ghost zones 
in all patches. 

2. Use interpatch interpolations to fill in all interpatch ghost zones in all patches. 



Another reason to keep the interpolations centered in AHFinderDirect was to allow re-use of the 
multiple-patch software from an earlier time evolution code ([50]), where centering the interpolations 

helped keep the evolution stable. 

The ordering of the phases is essential to obtain correct results, within each phase the different ghost 
zones and patches may be processed in any order. 
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3. Use symmetry operations to fill in the corners of all symmetry ghost zones in all 
patches. 



3. Jacobian Computation 

The symbolic-differentiation Jacobian (14) must be modified to take into account the 
ghost-zone symmetry operations and interpatch interpolations described in the previous 
subsection. This is essentially a straightforward application of the chain rule for each of the 
Du and terms in (14). Figure 6 shows the resulting algorithm in detail. 

APPENDIX B: MULTIPROCESSOR AND PARALLELIZATION ISSUES 

APPENDIX C: SEARCHING FOR THE CRITICAL PARAMETER OF A 
1-PARAMETER INITIAL DATA SEQUENCE 

In the interests of brevity these two appendices are omitted here. They appear as sup- 
plemental material in the online edition of this paper, and as appendices B and C in the 
preprint- archive version (gr-qc/0306056). 
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This patch: 

• nominal grid point 
o ghost-zone grid point 
@ ... which is an interpolation output 
symmetry operation 

Adjacent patch: 

X nominal grid point 

••• which is an interpolation input 
X ghost-zone grid point 
El ■•■ which is an interpolation input 
^ " ^ symmetry operation 



FIG. 5: This figure shows the combinations of symmetry operations and interpatch interpolations 
used to fill in grid function values in ghost-zone corners. Part (a) (where there are both x —x and 
y ^ —y reflection symmetries) shows a corner between two symmetry ghost zones. Part (b) (where 
there is a y —y reflection symmetry) shows a corner between a symmetry and an interpatch 
ghost zone; sample output points for each phase of the 3-phase algorithm described in the text 
are labelled as 1, 2, and 3. Part (c) (where there are no symmetries) shows a corner between two 
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matrix 

for each angular grid point I 
{ 

for each angular coordinate index u and pair of indices uv 
{ 

for each molecule index m G or respectively 

{ 

J ^ I + m 

^ rnhf""^"^ °' mJ^f''''^"^ respectively 
if (j G nominal grid of the patch containing l) 
then Jij <— Jij + temp 
else { 

for each angular grid point K used in computing h[j] 
via the 3-phase algorithm of appendix A 2 

{ 

JiK JiK + temp X ( -^TT-T for the 3-phase algorithm ) 

\dh[K\ } 

} 



} 

} 

Jii < Jii + 
} 



} 

e(/i + e)-G(/t) 



FIG. 6: This figure shows AHFinderDirect's overall Jacobian-computation algorithm, including 
ghost-zone handling. 
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